On computing the nearest neighbor interchange distance
نویسندگان
چکیده
In the practice of molecular evolution, different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [24] or from different genes [15, 16, 17, 18, 14]. Comparing these trees to find their similarities (e.g. agreement or consensus) and dissimilarities, i.e. distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance [29, 28, 34, 3, 6, 2, 19, 20, 23, 33, 22, 21, 26] is a natural distance metric that has been extensively studied. Despite its many appealing aspects such as simplicity and sensitivity to tree topologies, computing this distance has remained very challenging, and many algorithmic and complexity issues about computing this distance have remained unresolved. This paper studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of this distance on weighted phylogenies. The following results answer many open questions about the nni distance posed in the literature. 1. Computing the nni distance between two labeled trees is NP-complete. This solves a 25 year old open question appearing again and again in, for example, [29, 34, 3, 6, 2, 19, 20, 23, 22, 21, 26]. 2. Computing the nni distance between two unlabeled trees is also NPcomplete. This answers an open question in [3] for which an erroneous proof appeared in [23]. 3. Biological applications motivate us to extend the nni distance to weighted phylogenies, where edge weights indicate the time-span of evolution along each edge. We present an O(n2) time approximation algorithm for computing the nni distance on weighted phylogenies with a performance ratio of 4 logn+ 4, where n is the number of leaves in the phylogenies. We also observe that the nni distance is in fact identical to the linear-cost subtree-transfer distance on unweighted phylogenies discussed in [4, 5]. Some consequences of this observation are also discussed. 1991 Mathematics Subject Classification. Primary 68Q17, 68W40; Secondary 68Q25. The results reported here also form a subset of the results that appeared in Proc. 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 1997, pp. 427-436 [4]. The remaining results of the conference paper which do not appear in this paper appeared separately in Algorithmica, Vol. 25, No. 2, pp. 176-195, 1999. The first author was supported by an CGAT (Canadian Genome Analysis and Technology) grant. The second author was supported in part by CGAT and NSF grant 9205982. The third author was supported in part by NSERC Operating Grant OGP0046613 and CGAT. The fourth author was supported by NSERC Operating Grant OGP0046506 and CGAT. The fifth author was supported by an NSERC International Fellowship and CGAT. . Work done while the first author was at University of Waterloo and McMaster University, the second author was visiting at University of Waterloo, the third author was visiting University of Washington, and the fifth and the sixth authors were at University of Waterloo.
منابع مشابه
Improved Phylogeny Comparisons: Non-shared Edges, Nearest Neighbor Interchanges, and Subtree Transfers
The number of the non-shared edges of two phylogenies is a basic measure of the dissimilarity between the phylogenies. The non-shared edges are also the building block for approximating a more sophisticated metric called the nearest neighbor interchange (NNI) distance. In this paper, we give the rst subquadratic-time algorithm for nding the non-shared edges, which are then used to speed up the ...
متن کاملEvaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملEvaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملNew fingerprint of the entanglement on the thermodynamic properties
The realization that entanglement can affect macroscopic properties of bulk solid-state systems is a challenge in physics and Chemistry. Theoretical physicists often are considered the entanglement between nearest-neighbor (NN) spins and tried to find its characterizations in terms of macroscopic thermodynamics observables as magnetization and specific heat. Here, we focus on the entanglement b...
متن کاملEFFECT OF THE NEXT-NEAREST NEIGHBOR INTERACTION ON THE ORDER-DISORDER PHASE TRANSITION
In this work, one and two-dimensional lattices are studied theoretically by a statistical mechanical approach. The nearest and next-nearest neighbor interactions are both taken into account, and the approximate thermodynamic properties of the lattices are calculated. The results of our calculations show that: (1) even though the next-nearest neighbor interaction may have an insignificant ef...
متن کاملHow k-Nearest Neighbor Parameters Affect its Performance
The k-Nearest Neighbor is one of the simplest Machine Learning algorithms. Besides its simplicity, k-Nearest Neighbor is a widely used technique, being successfully applied in a large number of domains. In k-Nearest Neighbor, a database is searched for the most similar elements to a given query element, with similarity defined by a distance function. In this work, we are most interested in the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999